-
-
Notifications
You must be signed in to change notification settings - Fork 424
Makes document attachment usage consider external attachments #1600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is dependent on the UI PR being merged. This needs to be rebased once that is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we need some tests on just adding and removing attachments and checking that stats update correctly, when those attachments are external. I see some attachment accounting tests in test/nbrowser/DocumentUsage.ts in grist-saas. If those aren't entangled too much with billing would it be worth bringing them across? Or are there already tests?
app/server/lib/ActiveDoc.ts
Outdated
const attachments = this.docData?.getMetaTable("_grist_Attachments").getRecords(); | ||
for (const attachmentRec of attachments ?? []) { | ||
const newSize = newFileSizesByFileIdent.get(attachmentRec.fileIdent); | ||
if (newSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking if the size is different from the old recorded size? I believe the data engine will prune changes where a value is written to a cell that already contains that value, but seems easy to cut it out at the source here.
test/nbrowser/DocumentUsage.ts
Outdated
import {server} from 'test/nbrowser/testServer'; | ||
import {setupTestSuite} from 'test/nbrowser/testUtils'; | ||
|
||
describe('DocumentUsage', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this test over using DIFF? I see that you've changed the content of this file, and I don't have tools to show me the what was changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can tag which bits are the same? I've done a restructuring on this file to make testing with external attachments easier, so I'm not sure a diff would show many useful things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a rebase that should make this easier.
This commit has all the DocumentUsage changes in:
c90b777
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commit hash, had to do a rebase to sort a merge issue:
aacb5ee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe add another file for testing only external attachments and leave this one untouched?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these tests fail if untouched 😅
I can isolate and duplicate the attachments tests if you want, and ensure they pass under both external and internal.
But I'm also not seeing the harm in having a pure-core version of this file? Happy to discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion:
- Several of the non-attachments related tests will be moved to a new PR
- This file will be renamed
test/nbrowser/DocumentUsage.ts
Outdated
async function makeSessionAndLogin() { | ||
// login() needs an options object passing to bypass an optimization that causes .login() | ||
// to think we're already logged in when we're not after using `server.restart()`. | ||
// Without this we end up with bad credentials. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you end up with old credentials, not bad. You can be logged into grist using multiple accounts at the same time, this is a feature :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a slight rephrasing.
The original session ends up with old credentials, you're right! But any new session ends up with bad credentials - specifically the bearer token ends up as api_key_for_chimpy
, which just gives us the wrong responses from the API 😅
e0a0f53
to
871b9e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hide whitespace changes for this diff - it'll make it much smaller. A lot of it was just indentation changes.
6e4d36f
to
b2c0483
Compare
# Conflicts: # test/nbrowser/DocumentUsage.ts
# Conflicts: # test/nbrowser/AttachmentsTransfer.ts
b2c0483
to
bc5ae34
Compare
test/nbrowser/gristUtils.ts
Outdated
@@ -3618,6 +3620,59 @@ export function withEnvironmentSnapshot(vars: Record<string, any>) { | |||
}); | |||
} | |||
|
|||
export function enableExternalAttachments(transferDelay?: string, preserveEnvVars = true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't found usage of the second argument, and this whole function looks very similar to the one above.
What do you think about:
Changing line 3611 to:
process.env[key] = typeof vars[key] === 'function' ? await vars[key]() : vars[key];
Changing this function to
export function enableExternalAttachments(transferDelay?: string) {
withEnvironmentSnapshot({
GRIST_EXTERNAL_ATTACHMENTS_MODE: 'test',
GRIST_TEST_TRANSFER_DELAY: transferDelay ?? "",
GRIST_TEST_ATTACHMENTS_DIR: async () => mkdtemp(path.join(await createTmpDir(), 'attachments'))
});
}
And just reading process.env.GRIST_TEST_ATTACHMENTS_DIR whenever we need this information in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main difference between this function and withEnvironmentSnapshot
is it only touches specific variables, rather than the whole environment.
The reason is there's a lower chance of accidental issues / conflicts this way, such as when multiple environment snapshots are in use!
That's also why it has that second parameter - to disable the environment restore if it's causing problems.
Let me see if I can deduplicate this a bit though, if that's the main issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've integrated all the behaviour I wanted into EnvironmentSnapshot
and withEnvironmentSnapshot
.
You can now pass them specific variables you want to save/set, and it'll save / restore only those variables, minimising the chance of conflicting changes.
The key reason I want this here specifically is because with gu.enableExternalAttachments
, it's not apparent that the environment is being modified - increasing the risk of an accidental problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so sorry for the comment above, I'm lost here a little bit with the logic you created. For me this was a simple stack, there was only one env at a time, it was a snapshot not a patch, there are no multiple environment in use (I don't know even what that means for you).
I don't get it, why do you need to touch only specific variables, and how do you create multiple process at the same time? I tested my proposition before, and it worked fine.
Can you revert those changes to the one you had before? Now it is very hard to understand what is going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I've moved the helper out to its own file. :)
} | ||
} | ||
|
||
const action: BulkUpdateRecord = ['BulkUpdateRecord', '_grist_Attachments', rowIdsToUpdate, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth checking if rowIdsToUpdate
is empty and returning earlier (without all external calls)?
app/server/lib/ActiveDoc.ts
Outdated
await this._applyUserActionsWithExtendedOptions( | ||
docSession, | ||
[action], | ||
{ attachment: true }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ attachment: true }, | |
{attachment: true}, |
Saas linter will mark this. We have a rule: objects without spaces, scopes with spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the warning, might be worth bringing that rule over to core!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do, I already tried and failed, but don't remember the specifics.
import path from 'path'; | ||
import * as gu from 'test/nbrowser/gristUtils'; | ||
import {fileDialogUpload, TestUser} from 'test/nbrowser/gristUtils'; | ||
import {server, setupTestSuite} from 'test/nbrowser/testUtils'; | ||
import * as testUtils from 'test/server/testUtils'; | ||
import {setupTestSuite} from 'test/nbrowser/testUtils'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a refactor, am I right? I dont see any actual code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - the uses for server
and testUtils
were replaced by a call to gu.enableExternalAttachments
test/nbrowser/DocUsageTracking.ts
Outdated
|
||
it('shows usage stats on the raw data page', async function() { | ||
await session.tempNewDoc(cleanup, "EmptyUsageDoc"); | ||
await testDocUsageStatsAreZero(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to inline this method. It is not used anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep this one as is - it's not used here right now, because I moved the extra tests to another branch.
When I open a PR for that, this will be used in multiple places again :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, we usually avoid doing this that way. If it is not used right now in multiple plays, lets introduced this abstraction later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will pull this out.
app/server/lib/ActiveDoc.ts
Outdated
fileSize: newFileSizesForRows | ||
}]; | ||
|
||
await this._applyUserActionsWithExtendedOptions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about wrapping it with try catch
? Asking because, this will throw if ids are not valid, it is very unlikely but still possible if the docData we have in memory is not exactly the same as the one in data engine, everything here is async, so maybe it can somehow interfere with other modifications.
Ideally this would all be done in "transaction like" scope, but I don't know if it is important to worry about that here. We have some mechanism for locking the doc modifications, but maybe this is overkill here. But there is other mechanism available - the upsert action BulkAddOrUpdateRecord
, where you can update those values based on the fileIdentity, not the ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a difficult question. The options seem to be:
- Don't catch, fail to import the attachments
- Catch, import the attachments with an incorrect file size
If this is very rare, I think I'd prefer to let it fail and force the user to re-import, than import attachments with an invalid file size.
I don't think upsert is an option unfortunately, as it's not correct to add a new record if it doesn't exist. Best case, we have an extra attachment being tracked that doesn't exist. Worst case, it breaks other code because it's not a fully valid record that's added (unless we try to reinsert the whole record...).
My inclination is to leave it, add a comment, see if it actually comes up in production.
How do other parts of the codebase handle the in-memory data and data engine being out of sync? Just locking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum was to wrap this in try catch, and rethrow something more meaningful, explaining what has happen.
I don't know for sure, but we have some data corruptions problems we don't know how to fix or don't know the reason. So here, we know that the ids can go wrong, so we can anticipate it and explain in more details.
Since we don't do any performance or real load tests, waiting for this to happen on prod is a wrong idea, we don't know anything about how this would work on prod.
I just checked the docstring for the BulkAddOrUpdateRecord
, there is a on option parameter that disables adding if it can't find records to update, so I think it actually does look as a good option, wdyt? Here is a docstring:
`options` is a dictionary with optional settings to choose other behaviours:
- Set "on_many" to "all" or "none" to change which records are updated when several match.
- Set "update" or "add" to False to disable updating or adding records respectively,
i.e. if you only want to add records that don't already exist
or if you only want to update records that do already exist.
- Set "allow_empty_require" to True to allow `require` to be an empty dictionary,
which would mean that every record in the table is matched.
Otherwise this will raise an error to prevent mistakes like updating an entire column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but apparently that method doesn't support metatables
Instead, I've wrapped it in a more specific error, and added some logging around that error too. How does that look?
test/nbrowser/DocUsageTracking.ts
Outdated
gu.enableExternalAttachments(); | ||
|
||
async function makeSessionAndLogin() { | ||
// login() needs an options object passing to bypass an optimization that causes .login() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not needed now, right? I think we fixed this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was avoided in a different way, by not restarting the server except at the start of the test file.
I can remove this as a result, but at the same time, it would be nice to document this behaviour somewhere. But not sure where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I meant, that we actually fixed this issue in other PR, but I can't find it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My memory was that this was intended behaviour and we weren't changing it, but my memory might be wrong! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those tests failures related to this PR?
This reverts commit aa16df7.
test/nbrowser/DocUsageTracking.ts
Outdated
enableExternalAttachmentsForTestSuite | ||
} from 'test/nbrowser/externalAttachmentsHelpers'; | ||
|
||
describe('DocumentUsage', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describe('DocumentUsage', function() { | |
describe('DocUsageTracking', function() { |
The convention is to have the same name as the file.
test/nbrowser/DocUsageTracking.ts
Outdated
await api.applyUserActions(docId, [['AddEmptyTable', "AttachmentsTable"]]); | ||
await gu.getPageItem('AttachmentsTable').click(); | ||
await gu.waitForServer(); | ||
await addAttachmentColumn('Attachments'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await api.applyUserActions(docId, [['AddEmptyTable', "AttachmentsTable"]]); | |
await gu.getPageItem('AttachmentsTable').click(); | |
await gu.waitForServer(); | |
await addAttachmentColumn('Attachments'); | |
await gu.sendActions([['AddEmptyTable', "AttachmentsTable"]]); | |
await gu.getPageItem('AttachmentsTable').click(); | |
await gu.addColumn('Attachments', 'Attachment'); |
When you change the document through the API, you are not waiting for those change to propagate back to the browser. It works 99% of time, but introduce flakines.
The second waitForServer
is not needed I think, is it?
Your helper for adding attachments column is doing the same thing that this call:
await gu.addColumn('Attachments', 'Attachment');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice, will put in these changes.
Not entirely sure if it's needed or not. I'll double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made these changes, you're 100% right, that second waitForServer isn't needed - gu.sendActions waits :)
test/nbrowser/DocUsageTracking.ts
Outdated
await driver.find('.test-right-tab-field').click(); | ||
await gu.addColumn(columnName); | ||
await gu.setType(/Attachment/); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper is not needed, there is a type parameter in the addColumn
, which does the same thing as you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to know, removed it :)
Context
Documents' attachment usage currently only considers files which are stored internally.
If a document has external attachments, they won't show in the document's "Size of attachments" bar in "Raw Data".
Proposed solution
This updates the attachment usage calculations to use
_grist_Attachments.fileSize
for external attachments, while preserving the existing logic for internal attachments.Missing attachments
If a document has external attachments and is uploaded to a Grist instance, these missing attachments will still count towards the document's attachment usage. There's no easy way to mitigate this in all situations without checking all attachments on document open.
Untrusted file size value
This creates a problem with imported documents, which may have an incorrect
fileSize
value if it has been manually altered (i.e - the value is untrusted).To mitigate this, the
fileSize
value is now updated when attachments are uploaded as a .tar archive. This means that eventually,fileSize
is always valid for files which are present in external storage.Related issues
#1021
Has this been tested?